Entity coherence for descriptive text structuring
نویسنده
چکیده
Although entity coherence, i.e. the coherence that arises from certain patterns of references to entities, is of attested importance for characterising a descriptive text structure, whether and how current formal models of entity coherence such as Centering Theory can be used for the purposes of natural language generation remains unclear. This thesis investigates this issue and sets out to explore which of the many formulations of Centering best suits text structuring. In doing this, we assume text structuring to be a search task where different orderings of propositions are evaluated according to scores assigned by a metric. The main question behind this study is how to choose a metric of entity coherence among many alternatives as the only guidance to the text structuring component of a system that produces descriptions of objects. Different ways of defining metrics of entity coherence using Centering’s notions are discussed and a general corpus-based methodology is introduced to identify which of these metrics constitute the most promising candidates for search-based text structuring before the actual generation of the descriptive structure takes place. The performance of a large set of metrics is estimated empirically in a series of computational experiments using two kinds of data: (i) a reliably annotated corpus representing the genre of interest and (ii) data derived from an existing natural language generation system and ordered according to the instructions of a domain expert. A final experiment supplements our main methodology by automatically evaluating the best scoring orderings of some of the best performing metrics in comparison to an upper bound defined by orderings produced by multiple experts on additional application-specific data and a lower bound defined by a random baseline. The main findings are summarised as follows: In general, the simplest metric of entity coherence constitutes a very robust baseline for both datasets. However, when the metrics are modified according to an additional constraint on entity coherence, then the baseline is beaten in domain (ii). The employed modification is supported by the subsidiary evaluation which renders all employed metrics superior to the random baseline and helps identify the metric which overall constitutes the most suitable candidate (among the ones investigated) for search-based descriptive text structuring in domain (ii). This thesis provides substantial insight into the role of entity coherence as a descriptive text structuring constraint. Viewing Centering from an NLG perspective raises a series of interesting challenges that the thesis identifies and attempts to investigate to a certain extent. The general evaluation methodology and the results of the empirical studies are useful for any subsequent attempt to generate a descriptive text structure in the context of an application that makes use of the notion of entity coherence as modelled by Centering.
منابع مشابه
An Optimal Approach to Local and Global Text Coherence Evaluation Combining Entity-based, Graph-based and Entropy-based Approaches
Text coherence evaluation becomes a vital and lovely task in Natural Language Processing subfields, such as text summarization, question answering, text generation and machine translation. Existing methods like entity-based and graph-based models are engaging with nouns and noun phrases change role in sequential sentences within short part of a text. They even have limitations in global coheren...
متن کاملDual function of first position nominal groups in research article titles: Describing methods and structuring summary
Previous research has identified the nominal group as the most distinctive feature of the research article title. In contrast, the findings reported in this paper suggest Theme/Rheme is the dominant structure in title text. Theme/Rheme structures order and tie nominal groups in titles. When a title starts with a methodological term the first position nominal group acts as a theme marker. Thus, ...
متن کاملEvaluating Centering-Based Metrics of Coherence
We use a reliably annotated corpus to compare metrics of coherence based on Centering Theory with respect to their potential usefulness for text structuring in natural language generation. Previous corpus-based evaluations of the coherence of text according to Centering did not compare the coherence of the chosen text structure with that of the possible alternatives. A corpusbased methodology i...
متن کاملDefeasible Rules in Content Selection and Text Structuring
This paper outlines a number of ways in which defeasible rules can contribute to the content selection and discourse structuring components of a text generation system. We suggest that, for certain types of descriptive text, the characterisation of discourse struc-turing mechanisms as operations on or involving defeasible rules provides an attractive framework for addressing important issues in...
متن کامل